Protein folding potential functions

نویسنده

Gordon M. Crippen

چکیده

There has been a great deal of activity recently on approaches to the calculation of protein folding using specially devised empirical potential functions. We have developed one such function that solves the protein structure recognition problem: given the sequence for a globular protein and a collection of plausible protein conformations, including the native conformation for that sequence, identify the correct, native conformation. Although it nas been trained on only 58 single-chain proteins, it recognizes the native conformation for essentially all compact, soluble, globular proteins having known native conformations in comparisons with lo4 to lo6 reasonable alternative conformations apiece. Furthermore, it correctly discriminates between native and nonnative structures of multichain aggregates without additional ir@ormation about disuljide bonds or bound ligands. Given its broad successes, we can use it to gain insight into the d$erences between several seemingly related computational problems. 1: Problem definition There has been a lot of excitement in the recent literature about computer calculations that “fold up proteins”, particularly methods that employ specially designed potential functions that are not general purpose molecular mechanics force fields, yet somehow incorporate mformation about protein folding. Along with all the excitement and optimistic claims of success has come a great deal of confusion over who has really done what, and what does it mean in other contexts. Our purpose here is not to give an authoritative review of the field, but rather to clear up some of the misconceptions and define some terms precisely enough to explain what we have been doing. The long term goal of many investigators has been the protein folding problem (FP): given only t!?e amino acit sequence, calculate the detailed three-dimensional (3D) structure of the protein. However, this statement of FP is insufficient. We must add the target accuracy of the prediction and a requirement of generality. The experimental answer the calculation is trying to match is almost always taken to be a high-resolution X-ray crystal or NMR structure. There is also consensus that the appropriate measure of protein conformational similarity is the root-meansquare deviation in C” coordinates after optimal superposition by rigid body translation and rotation (denoted here by RMSD). (We believe that the test of “‘topological similarity” is so ill-defined as to be a meaningless measure.) Unfortunately, there is no consensus about how smalI the RMSD between the calculated and crystal structure must be to count as success. We have recently proposed an objective RMSD cutoff between similar and dissimilar protein structures that depends on chain length, but is free of arbitrary decisions.[l] Using this criterion we find several cases where various authors have calculated the tertiary structure of small proteins nearly well enough, and one or two cases where they have barely succeeded. Yet no one has succeeded on more than one protein, to our knowledge. This brings up the requirement of generality. A particular folding algorithm may inadvertently or intentionally incorporate information about the protein being predicted, or it may be subtly biased toward producing structures of that type. Success in FP must include the ability to function on a variety of different protein structural types, as well as extension beyond the set of proteins that may have been used to develop the method. Just how broad the range of proteins must be is up for debate, but we would propose that a successful method should work at least for cr, p, and CY//~ types of globular, water soluble proteins. There is a second major goal that is of more recent interest than FP and opposite to it, namely the inverse folding problem (IFP). Here the intent is to calculate a sequence or sequences that will uniquely fold to a given 3D structure. In IFP the solution is not unique, as we know from the great similarity of the many mutant T4 lysozyme crystal strucwes, but otherwise the accuracy and generality issues are the same. The experimentally determined 3D structure of the designed amino acid sequence must be unique enough to crystallize, and must lie within the proposed RMSD limit compared to the given target structure. Furthermore, this must be generally hue for some wide variety of proteins, particularly those that are significantly 1060-3425/95$4.00O1995IEEE 319 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE different from structures used to develop the method. FP and IFP so formulated are probably distant goals, and we need easier problems for now, such that their solution will lead the way to success on the more difficult ones. Certainly if we can’t even distinguish between correctly and incorrectly folded structures of the same sequence, we have little hope of solving FP. Our research has therefore centered around what we will call the structure identification problem (3DID): given a particular amino acid sequence and a large collection of 3D protein structures of the correct chain length, one of which is the correct native structure, select that native structure. Once again, the problem statement needs to be refined as to accuracy and generality. Most investigators of 3DID have used a statistical treatment of accuracy by developing a scalar function of structure that can be used to rank all the structures given, and then noting that the native lies far out on the favorable end of the disttibutionJ2, 3, 41 We have adopted the much more stringent, nonstatistical requirement that the native must always be ranked first, just as the real protein folds to its one native structure and no other. As for generality, there is the range of applicability concerning sizes and types of native proteins, as well as the range of nonnative structural types. We consider only native proteins that are compact, globular, and water soluble, consisting of one or more polypeptide chains of naturally occurring ammo acids. They may be of any folding motif and have associated ions, small ligands and prosthetic groups, but otherwise the set of polypeptide chains comprising the native structure must be able to fold up independently of other macromolecules. For example, if the experimentally stable state of a protein is the dimer, we must consider both chains at once, not just the monomer. As far as the diversity of the alternative structures goes, we assume that the obviously bad ones have already been rejected by some structure quality assessment program that looks for left-handed a-helices, van der Waals contacts, unusual d/$ values, etc. Otherwise, they may be compact or noncompact, similar to the native or very dissimilar. As does the Sippl group, we generate our alternatives by cutting out contiguous segments of polypeptide chain the length of the native from larger PDB entries. The opposite of 3DID and a restriction of IF? is what we will call the sequence identification problem (SEQID): given a particular native sequence and its high-resolution 3D structure, select from a large set of sequences the one or more that will fold to the target structure. As in IFP, the answer is clearly not unique, given a large assortment of sequences, although the native sequence should certainly be one of the hits. A successful algorithm should be applicable to a broad class of native protein 3D types. The accuracy issue is not so straightforward. Suppose the Proceedings of the 28th Annual Hawaii International Conference on System Sciences 1995 algorithm ranks sequences according to their suitability for the target structure, which is a globin, for instance. If the target is globin A, and it differs only a little in RMSD from globin B, then is ranking sequence B ahead of sequence A a mistake? Perhaps sequence B is more strongly biased toward the globin folding motif than sequence A is. One further question of problem definition common to both 3DID and SEQID is the treatment of insertions and deletions. In the same way that permitting indels is essential to the success of sequence alignment algorithms, this is a reasonable feature of SEQID. Without it, one can well expect to identify only the native sequence, even when clearly homologous sequences are available for selection. Similarly, it has often been argued that 3DID must select the native structure on the basis of the conserved interior strands (the “core” residues) alone, so that if the assortment of structures to choose from does not happen to include the native, the algorithm will at least recognize some homologous structure. We present the argument in the next section that such a goal for a 3DID algorithm is incompatible with experiment and with the previously stated objectives of the problem. In any case, we have strictly considered 3DID without gaps of any sort.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Osmolyte-Induced Folding and Stability of Proteins: Concepts and Characterization

It is well-known that the typical protein’s three-dimensional structure is relatively unstable in harsh conditions. A practical approach to maintain the folded state and thus improve the stability and activity of proteins in unusual circumstances is to directly apply stabilizing substances such as osmolytes to the protein-containing solutions. Osmolytes as natural occurring organic molecules ty...

متن کامل

Osmolyte-Induced Folding and Stability of Proteins: Concepts and Characterization

متن کامل

Energy Study at Different Temperatures for Active Site of Azurin in Water, Ethanol, Methanol and Gas Phase by Monte Carlo Simulations

The interaction between the solute and the solsent molecules play a crucial role in understanding the various molecular processes involved in chemistry and biochemistry, so in this work the potential energy of active site of azurin have been calculated in solvent by the Monte Carlo simulation. In this paper we present quantitative results of Monte Carlo calculations of potential energies of ...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Easily searched protein folding potentials.

In order to calculate the tertiary structure of a protein from its amino acid sequence, the thermodynamic approach requires a potential function of sequence and conformation that has its global minimum at the native conformation for many different proteins. Here we study the behavior of such functions for the simplest model system that still has the essential features of the protein folding pro...

متن کامل

Protein Stability, Folding, Disaggregation and Etiology of Conformational Malfunctions

Estimation of protein stability is important for many reasons: first providing an understanding of the basic thermodynamics of the process of folding, protein engineering, and protein stability plays important role in biotechnology especially in food and protein drug design. Today, proteins are used in many branches, including industrial processes, pharmaceutical industry, and medical fields. A...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1995

Protein folding potential functions

نویسنده

چکیده

منابع مشابه

Osmolyte-Induced Folding and Stability of Proteins: Concepts and Characterization

Osmolyte-Induced Folding and Stability of Proteins: Concepts and Characterization

Energy Study at Different Temperatures for Active Site of Azurin in Water, Ethanol, Methanol and Gas Phase by Monte Carlo Simulations

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Easily searched protein folding potentials.

Protein Stability, Folding, Disaggregation and Etiology of Conformational Malfunctions

عنوان ژورنال:

اشتراک گذاری